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Abstract — Identifying communities in social networks be- 
comes an increasingly important research problem. Several 
methods for identifying such groups have been developed, 
however, qualitative analysis (taking into account the scale of 
the problem) still poses serious problems. This paper describes 
a tool for facilitating such an analysis, allowing to visualize 
the dynamics and supporting localization of different events 
(such as creation or merging of groups). In the final part of the 
paper, the experimental results performed using the benchmark 
data (Enron emails) provide an insight into usefulness of the 
proposed tool. 
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I. Introduction 

Current trends in identification of groups in complex 
network analysis tend to go beyond static analysis (see, e.g., 
(11,121) and take into account the dynamic character of the 
environment, mostly concerning the quantitative analysis of 
such dynamic groups. Qualitative analysis becomes a very 
difficult task, due to huge network sizes, possible number 
of groups and time-dependence. In this paper, GEVi (Group 
Evolution Visualisation) — a tool for the graphical analysis 
of the evolution of groups will be presented. 

Real-life networks are characterized by rapid changes 
and the groups that may be located are mostly short-lived 
and elusive. In order to analyse certain processes or trends 
occurring in groups, different time periods should be taken 
into account. Observation of changes should lead into stating 
the reasons for creation, extension or disappearance of 
certain groups. It is to note, that an additional challenge 
is the fact, that one user may be a member of many groups. 
Correlating of the observation of the network dynamics with 
external events may lead to explaining of certain processes 
occurring in the structure of groups and to allow prediction 
of future events. 

In the paper, after presenting the state of the art and 
describing the utilized method of groups extraction, the 
features of the presented tool are shown and the experimental 
results obtained from popular Enron dataset are discussed. 

II. Related Work 

Initially finding groups (communities) in large social net- 
works was made possible by extracting certain features from 



the network and analyze them on higher level of abstraction: 
the network could be represented in an equivalent, but much 
less complex form as groups and the relationships between 
them |3|. Nowadays, group finding techniques allow not 
only to simplify the network, but moreover, to analyze 
certain processes in micro and macro scale. There are many 
definitions of a group, but usually it is assumed that the 
group is a set of vertices which communicate to each other 
more frequently than with vertices outside the group. Many 
methods of finding groups (mainly in static graphs) have 
been proposed IH. Nowadays, many results regarding the 
the dynamics of the network, taking into account the time 
and its impact on the life cycle of the groups are published 
ISl 121 . Palla et al. in (Tl identified basic events that may 
occur in the life cycle of the group: growth, merging, 
birth, construction, splitting and death. They did not give 
any additional conditions. Asur in O introduced formal 
definitions of five critical events. Greene in |6| presented a 
review of the fundamental events describing group evolution 
and formulated these key events in terms of rules. 

In 1 7 1, a tool for visualization of the evolution for non- 
overlapping groups was proposed. With this tool one can 
analyse the membership of certain individuals in the group, 
rather than the evolution of the group itself. 

III. The method of groups extraction in dynamic 

ENVIRONMENT 

We have used SGCI (Stable Group Changes Identification) 
algorithm and CPM (Clique Percolation Method) | 8 | as a 
group extraction method. The algorithm consists of four 
main steps: identification of short-lived groups in each 
separated time interval; identification of group continuation 
(using modified Jaccard measure), separation of the stable 
groups (lasting for a certain time interval) and the identifica- 
tion of types of group changes (transition between the states 
of the stable group). A detailed description of the algorithm 
is in |9|. 

We used the set of events identified in f9l, applying 
more general methods for their identification. The algorithm 
identifies transitions between groups observed at time t and 
the groups observed at the time t + 1 (their successors). This 
is achieved by comparing the size of the source groups, with 




each of their successors, rather than the difference in size 
between all successors. 

For various reasons, it is interesting to observe lifespan 
of communities. How social network is evolving? What 
are the reasons for appearance of communities in social 
network, how they grow or shrink, what are the causes 
of new members joining and abandoning the old? Whether 
the community observed in two time periods is the same 
community, even though, for example, there is no common 
members? 

There are many interesting questions, but the available 
tools lack possibilities of simple, preferably graphical, anal- 
ysis of groups life-cycle. A tool that may be used both 
for quantitative and qualitative analysis presenting graphical 
visualization of events and changes in the network would be 
much desired. 

IV. Tool for graphical analysis of network 

EVOLUTION (GEVi) 

The GEVi visualizes groups in timeslots and displays 
transitions between them in a form of graph. Each distinct 
hierarchy of group evolution is displayed as a separate graph. 
To implement visualisation we used JGrapqJ Java-based 
library. 

A. Visualisation technique 

The groups and transitions between them are represented 
using hierarchical (Sugiyama type) layout. It 1101 has several 
interesting features: there are few edge crossings, the nodes 
are evenly distributed and the edges are as straight as 
possible. The Sugiyama layout is a method for visualizing 
directed graphs and consists of the following stages: 

• cycle removal - some edges are reversed in order to 
make the graph acyclic (at the end of algorithm they 
are reversed again to initial state), 

• layer assignment - assignment of the vertices to layers 
(if there are edges that pass not only through adjacent 
layers, the dummy vertices are introduced), 

• crossing reduction - in each layer the ordering of 
vertices is calculated in order to minimize the number 
of edge crossing, 

• coordinate assignment - positioning of vertices so they 
do not overlap each other and that vertices not lie on 

^ http :// w w w.j graph . com/j graph .html 



the straight lines between two adjacent vertices from 

different layers, placing edges. 
In our case, the transitions between groups cannot form 
cycles in graph so we omitted first stage. The second stage 
was simple in our situation because the groups are assigned 
to timeslots where they were extracted. As the layers in the 
graph represent the timeslots, so we preassigned nodes in 
the graph to their layers. For reduction of crossings and 
coordinate assignment, some variants of median method 
described by Gansner [11 J were used. 

B. Features 

In GEVi, each group is labelled in a form 
timeslotN umber _groupN umber which eases the 
identification of the groups during their evolution. GEVi 
enables not only analysis of transitions between groups 
in different time slots (fig. |2]) but also shows the size of 
groups (in square brackets inside vertices), denoting how 
many members get inside the group during each group 
transition (label on transition) and how many of them 
get outside during each group transitions (in a form of 
number close to the green arrow — the green arrow pointing 
in the direction of the top-right corner stands for the 
number of members that go outside groups connected by 
outgoing transitions and the green arrow pointing in the 
direction of the bottom-right corner stands for the number 
of members that go into given group). For instance, the 
group 92_1 from fig. [2] has 2 input edges (96 members 
flow from predecessors of that group to the given one) and 
additionally 9 members (not belonging to predecessors of 
that group) come to this group. The group has 3 outgoing 
edges (100 members flow to its successors) and additionally 
5 members leave that group. 

Some transitions are displayed as dashed arrows — this 
indicates that groups between given transition differ signifi- 
cantly in size (one of them is at least 10 times bigger than the 
second one). Such transitions represent events described as 
addition or deletion (depending whether small group attaches 
to the larger or small group detaches from the larger one). 

In the transition pop-up menu, there is an additional 
information about stability during group transition and in 
the group pop-up menu (fig. [2]) - the members of the group 
are listed. 

GEVi also gives us information about overlapping of the 
members between the groups. After selecting of the group, 
all other groups that have in common at least one member 
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Figure 2. Visualisation with showing context menu for group. 

with the selected one are highlighted (fig. [5]) and the informa- 
tion is displayed, regarding the number of common members 
(number between characters < and > inside vertex) and 
in the pop-up menu the members of all highlighted groups 
common with the selected one are shown. 
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Figure 3. Visualisation showing common members for group 92_3. 

To be more useful, GEVi supports also zooming graphs 
and searching for groups by its name in a form of 
timeslotN umber _groupN umber (after finding the group, 
the focus is set and the view is centered). 
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Figure 4. Search for specific group on visualisation. 
C. Model 

In this section a simple model for describing the analysis 
of the network dynamics is proposed. 



A complex network or social network may be of course 
described using standard definition of a graph: 

N={V,E) (1) 

where: F C N, stands for a finite set of vertices, that is: 

V = {i:ienM< imax} (2) 

and E =c V X V is 3. finite set of edges. 

Striving to provide means for observation of groups that 
are formed in a certain time moment, let us consider the 
following space of system states: G = 2^ . The elements 
of G are any possible subsets of V. Now, observing the 
system in a certain time moment, it may be seen that the set 
of vertices is decomposed into following subsets: 



G3 gt = {gt,k}^t,ke N. 
each subset may be described as: 

gt,k = {^1, • • • ,^maxt,fc}- 



(3) 



(4) 



where maxt^u stands for maximum number of the individ- 
uals in the group. Note, that the subsets observed at certain 
time t may contain the same elements (they may overlap), 
fulfilling the following condition: 



y ty ky jj ^k=> gt,k n gtj = 



(5) 



therefore the groups of the vertices in a certain time moment 
do not overlap. 

Now, let us define the graph depicting the dynamics of 
the complex network. Again, as it is a graph, the definition 
is similar to the classical one: 



D = {Vd,Ed) 



(6) 



where: Vd = {t,k) e N x N, and Ed = Vd x Vd so 
this graph is composed of labels utilized before, in the 
definition of the complex network and the groups. Note, 
that this definition spans to the whole observation time of 
the network. 

The above-presented simple formalism is aimed to ease 
the definition of observed events and other primitives. 

As an example, let us use the above defined primitives 
to construct the maximum path describing the history of 
a group development, showing all the groups containing 
at least one vertice from the first one {tx^kx), in the 
newly defined dynamic graph. First, let us consider any path 
fulfilling the above-mentioned condition: 

(7) 

now let us choose the path with maximum length: 



mp = argmaxpi 



(8) 



For example, let us define Modified Jaccard measure 



change_size: 
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and ratio of groups size 



ds{A^ B) = max{ 
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(9) 
(10) 



where A 7^ A 5 7^ 0. 

Transition tg. j^^g.^^ ^ can be defined as: 

^gi,k,gi+i,i • AM J{gi^k, 9i+i,i) > (H) 

where t/i means threshold for creation of transition (in 
experiments we set value of th to 0.5). 

Due to the limited space in this article we present only 
formulation for split_merge event (figure [T] shows illustration 
for most events): 

Now we can label transitions: 

• addition: 



deletion: 



^g^,k.gi+l,l ' \9i+iM\9iA > (12) 



^gi,k.gi+i,i ' \gi,k\/\9i+iA > (13) 



• merge: 

tgi,k,gi+i,i ' ds{gi^k,9i+i,i) < sh A 
[^^gi,m,gi+i,i -m^kA ds{gi^rn, Qi+i^i) < sh] A 
[^^gi,k,g^+l,r. • ^ ^ I /\ds{gi^k,9i+i,n) < sh] (14) 

• split: occurs when group divides into 2 or more groups 
in next time slot and these groups from next time slot 
have similar size to the group that divides 

^gi,k.g^+l,l ' ds{gi^k,gi+i,i) < sh A 
[^^gi,k,gi+i,r^ ' ^ 7^ I /\ds{gi^k,gi+i,n) < sh] A 
[^^g^,ru,gi+l,l -m^kA ds{gi^rn, Qi+i^i) < sh] (15) 

The event split _merge occurs when group gi^k divides 
into 2 or more groups in next time slot, these groups 
from next time slot have similar size to gi^k^ the group 
gij^i^i is created from 2 or more groups from previous 
time slot and these groups from previous time slot have 
similar size to ^i+i,^ 



^gi,k.gi+i,i ' ds{gi^k^9i+i,i) < sh A 



m',gi-\-l,l 



^k A ds{gi^rn,gi+i,i) < sh] A 



P^^.,fc,^i+i,r. • ^ 7^ I ^ds{gi^k,9i+i,n) < sh] (16) 
• constancy: 

tgi,k,gi+i,i • ctbs{\gi^k\ - \gi+i,i\) <dhA 
[^^gi,rr.,g^+l,l ' ru ^ k A ds{gi^rn, 9i+i,i) < sh] A 
[^tg^,k,g^+l,r. ' ^ ^ I /\ ds{gi , Qi+i .u) < sh] (17) 



[^^g^,rr..g^^l,l ' m k A ds{g i , Q i+1 ,i) < sh] A 



gi,k,gi+i,'n 



+ I A ds{gi^k,9i+i,n) < sh] (18) 



decay: 



'^^gi,k,gi+i,i 



(19) 



In above definitions we used function abs which means 
absolute value function and some parameters: sh - threshold 
for ratio of groups size and dh - threshold for groups size 
differences. In experiments we set value of sh to 10 and 
value of dh to 3. where sh stands for the threshold for ratio 
of groups size, which in experiments was set to 10. 

V. Graphical analysis of Enron dataset 

A. Dataset 

We analyzed one of the most popular datasets in complex 
network analysis: Enron emails. The dataset was prepared in 
the form of MySQL database and described by Shetty and 
Adibi|12|. They made it publicly available 

The analyzed data contains emails from 151 users and 
252 759 messages from the following period of time: 
5.01.1998-3.02.2004. Some messages were sent to group of 
people, therefore such messages can be expanded into multi- 
ple messages between single sender and single recipient. The 
database contains 2 064 442 of such expanded messages. 
We restricted messages to the ones that were exchanged 
only between employees (that were listed in the database 
in separate table). After rejection, there were 50 572 left of 
the expanded messages. 

B. Group extraction and evolution 

The analyzed period was divided into time slots, each 
lasting 30 days. The neighbouring slots overlap each other 
by 50% of their duration and in the examined period of time 
there are 149 time slots. 

After separation of time slots we extracted the groups 
in each time slot. We used CPM method of commu- 
nity extraction (CPMd version from CFindeij^ tool) for 
k=3. For this parameter groups were extracted in slots 
between 31 (15.04.1999-15.05.1999) and 108 (13.06.2002- 
13.07.2002) — in other time slots there were so few messages 
between users that no groups were formed (for higher k 
values the range of time slots containing any groups is even 
more narrow). 

Transitions between groups were assigned using our 
method SGCI described earlier. The threshold on modified 
Jaccard measure was set on level equals 0.5. 

^ http : //w w w. isi . edu/ ~adibi/Enron/Enron . htm 
^ http : //w w w. cfinder. org/ 



C Group sizes 

Running simple statistic algorithms we determined that 
most of the groups are small — groups that have their size 
equal 10 or less constitute about 80% of all groups. As we 
can see on figure [5] most groups are small. 
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Figure 5. Number of groups with given size. 

Using GEVi, we can observe the size for each group as 
it was demonstrated on fig. |6] For instance, the group 92_1 
has 103 members and size of group 93_4 equals 3. 

D. Number of groups in timeslots 

Fig. [6] shows the number of groups and messages in time 
slots. The stars on chart represent key events from timeline 
of Enron: 

. 12.02.2001 - Skilling is named CEO (slots 74, 75), 
. 14.08.2001 - Skilling resigns as CEO (slots 86, 87), 
. 2.12.2001 - Enron files for bankruptcy (slots 94, 95). 
We were inspired by work of Collingsworth, Menezes and 
Martins flTl , who also analyzed Enron dataset and in the 
cited paper, there is presented a chart showing the relation 
between the number of emails sent by users and key events 
for company (the same as we recalled above). They noticed 
that peaks of the exchanged emails happened before key 
real events at an average of 2 months earlier. Therefore, 
we prepared similar chart as they used in their work — the 
chart presenting number of messages (we are showing only 
number of messages exchanged between employees) in time. 

We can compare these 2 charts in fig. [6] and we can ob- 
serve that peaks on chart with number of messages precede 
mentioned events but on chart with number of groups in 2 
first cases peaks precede events and the last peak is right 
after the last event. 

The number of groups in each time slot can be easily 
perceived — the groups from the same time slot in the same 
hierarchy are positioned vertically one above the other. 

E. Stability of groups in timeslots 

In fig. [7] mean stability (with standard deviation) between 
groups in slots is presented (e.g., stability in the slot 100 
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Figure 6. Number of groups and messages in time slots. 



corresponds to stabilities between groups from the slot 100 
and the slot 101). We can observe that stability gradually 
decreases in time until slot 100 (13.02.2002-15.03.2002) 
which happened about 3 months after bankruptcy of Enron. 
We can also notice that when the mean stability decreases, 
the standard deviation has large values, which is caused by 
many deletions and additions. 




Figure 7. Stability of groups in time slots (mean and standard deviation). 

The stability of each transition between groups can be 
observed in GEVi when hovering mouse pointer over a 
certain chosen group — see fig. [8] or indirectly: if in a given 
time slot there are more dashed transition arrows, the mean 
stability is expected to be less than in timeslots when there 
are mainly solid arrows, which is presented in fig. [8] (mean 
stability between groups in slots 99 and 100 is less than 
between groups from slots 100 and 101). 

F. Exchange of members of group in time 

Four different hierarchies can be visualised in GEVi. The 
most interesting one is shown in fig.|9] where the highlighted 
groups are the ones having in common at least one member 
with the first group in this hierarchy (group labelled as 
31_0). The mentioned group has 3 members and as we can 
notice, in each next time slot (every time slot has different 
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Figure 8. Stability for chosen transition on visualisation. 



vertical layer in visualization) there is at least one group that 
has any common members with that group (what is presented 
in fig. [9]). In the last time slot for this hierarchy (slot 102) 
the only one person from the initial group is present. 

This example shows how this tool can be used in ana- 
lyzing, how long a given group can exist without complete 
exchange of initial members of group. 

G. Common members between groups in the same time slot 

The maximum number of common members between 
each group pair from the same time slot equals 3. It seems 
that in about 22% of all pairs from the same time slot there 
is at least one common member. 



Figure 10 presents summary of common members in 
group pairs from the same time slot. It seems that in the 
data set the maximum number of common members equals 
3. 



We can also observe on figure 11 the distribution of 
common members between pair of groups in time. The slot 
where there is the highest number of common members in 
relation to number of group pairs is slot 95 - there is peak for 
3 common members and also similar value for 1 common 
member between groups in that time slot. In this slot the 
bankruptcy of Enron occured. 
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Figure 10. Common members in group pairs. 

GEVi makes possible checking common elements for each 
selected group with the other ones. For instance, in fig.|3]we 
can see that group 92_3 has 5 members and with group 92_1 
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Figure 11. Common members in group pairs in time slots. 

has 2 common members, with 92_2 has 1 common member 
and there is no common members with group 92_0. 

H. Overlapping groups in the same time slot 

Most groups overlap at least with one another group in the 
same timeslot. The groups that do not overlap with others 
constitute about 30% of all groups. 
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Figure 12. Number of groups that overlaps with given number of other 
groups in the same time slot. 

The presented tool enables possibility to check the group 
overlapping in the same time slot. Referring to fig. [3j one 
can see that the group 92_3 overlaps with 2 other groups in 
the same time slot. 

/. Membership of people to groups in time slots 



Figure 13 shows membership of people to different num- 
ber of groups in the same time slot. 

/. Analysis of behavior of group dynamics close to Enron 
bankruptcy 

Enron bankruptcy took place in 94th and 95th time slots 
(slots are overlapping). Fig. [14] shows that right before that 
event, there exist several small and one big group in time 
slots, but in 95th time slot large group 94_2 splits into some 
smaller groups. It could suggest that people were afraid 



Figure 9. Visualisation of groups that have common members with first group in hierarchy. 
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Figure 13. Number of people belonging to defined number of groups in 
time slots. 

of their situation and prefer leading communication with 
subgroup of people who seemed to be more trustworthy. We 
can also observe that after some time the situation changes 
and again most people belongs to one large group. Another 
interesting remark is that in 96th time slot (just after Enron 
bankruptcy), there is a peak of group numbers that could also 
imply certain doubtfulness of people about their situation 
and interacting with other people (via mail) in small groups. 

VI. Conclusion 

In this paper GEVi's features were described. The tool 
allows to construct higher abstraction level charts and use 
them for visualization of certain group events. In the future 
we plan to add possibilities of detecting new events and to 
employ different benchmark and real-world data to tune-up 
the proposed network analysis tool. 
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Figure 14. Group dynamics close to Enron bankruptcy. 



